-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try opening with contextmanager #370
try opening with contextmanager #370
Conversation
--- a/fsspec/implementations/http.py
+++ b/fsspec/implementations/http.py
@@ -642,7 +642,7 @@ class HTTPFile(AbstractBufferedFile):
self.url,
self.mode,
self.blocksize,
- self.cache.name,
+ self.cache.name if self.cache else "none",
self.size,
),
) ? -edited- |
This PR works with fsspec/filesystem_spec#973! 🎉 Thanks for the suggestion Martin! |
pangeo_forge_recipes/transforms.py
Outdated
try: | ||
ds = xr.open_dataset(open_file, **self.xarray_open_kwargs) | ||
except ValueError: | ||
with open_file as fp: | ||
ds = xr.open_dataset(fp, **self.xarray_open_kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The goal is to remove the try / except block and just have consistent behavior with all open files. But If I try to just always use the context manager, like this,
try: | |
ds = xr.open_dataset(open_file, **self.xarray_open_kwargs) | |
except ValueError: | |
with open_file as fp: | |
ds = xr.open_dataset(fp, **self.xarray_open_kwargs) | |
with open_file as fp: | |
ds = xr.open_dataset(fp, **self.xarray_open_kwargs) |
I get the error
E TypeError: cannot pickle '_io.BufferedReader' object
[while running 'Create|OpenWithFSSpec|OpenWithXarray/OpenWithXarray/Open with Xarray']
This only happens when opening from the cache. In that case, we are not using fsspec.open
but rather using the filesystem open
method that is attached to the cache object, i.e.
pangeo-forge-recipes/pangeo_forge_recipes/storage.py
Lines 129 to 133 in a8c951f
def open_file(self, path: str, **kwargs) -> OpenFileType: | |
"""Returns an fsspec open file""" | |
full_path = self._full_path(path) | |
logger.debug(f"returning open file for {full_path}") | |
return self.fs.open(full_path, **kwargs) |
So we need to try to reconcile the behavior of get_opener
with cache.open_file
, such that they return the same type of open file object.
I've read some of the discussion around this issue, and I'd like to offer a few pointers. To achieve the goals described here, consider using Apache Beam's Filesystems API. It requires no extra dependency besides Beam. I've used it to solve a similar problem here: https://github.com/google/xarray-beam/blob/main/xarray_beam/_src/pangeo_forge.py#L115 |
Alex, I appreciate the suggestion to look into the beam filesystems. I just read up on them and they look nice. However, you must understand that fsspec is too tightly intertwined into our stack to abandon at this stage. Xarray and Zarr both have very helpful integrations with and and optimizations for fsspec filesystems. This PR seems to work now with fsspec/filesystem_spec#973. Surfacing issues and making upstream improvements for fsspec is an important secondary goal for our project, and that's what we will continue to do for now. |
As predicted, the I'm going to merge this for now. @martindurant would be goo to get an fsspec soon 🙏 but we can move forward with the upstream dev environment. |
This is one possible workaround for fsspec/filesystem_spec#579
This fails with the following error